[ETVK] WebGPU runtime#18808
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18808
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
@claude review this PR |
|
Claude finished @digantdesai's task in 3m 52s —— View job Code Review: WebGPU Backend Prototype
Overall this is a well-structured prototype that cleanly integrates a WebGPU runtime backend into ExecuTorch. The architecture of reusing the Vulkan serialization format and partitioner is pragmatic for bootstrapping. Below are my findings. Critical Issues1. Backend name collision with Vulkan — The WebGPU backend registers itself as Backend backend{"VulkanBackend", &cls};This means the Vulkan and WebGPU backends are mutually exclusive at link time — linking both into the same binary will cause a duplicate registration error. The TODO.md acknowledges this, but the code has no guard. At minimum, consider a 2. The std::vector<std::pair<const void*, size_t>> inputs;
...
std::vector<std::pair<void*, size_t>> outputs;ExecuTorch's runtime is designed for environments where heap allocation during inference is undesirable. Consider using stack-allocated small buffers (e.g., 3. The comment says
Moderate Issues4.
5. No bounds checking on value ID access — The WebGPUTensor& get_tensor(int id) {
return tensors_[id];
}An out-of-bounds 6. Exception safety concerns with The runtime uses 7. bool WebGPUBackend::is_available() const {
return true;
}This should attempt to check if a WebGPU device can actually be created (or at least if the global context exists). Returning 8.
Minor / Style Issues9.
10. Hardcoded if(NOT EXISTS "${WGPU_NATIVE_DIR}/lib/libwgpu_native.a")This assumes a static archive with a Unix naming convention. It won't work on Windows ( 11. Shader source duplication — The WGSL shader exists as both a standalone 12. The 13. Missing
14. Generally TODOs are tracked in issues rather than checked-in markdown files, which tend to go stale. Consider moving this content to a GitHub issue or discussion. What Looks Good
|
There was a problem hiding this comment.
Pull request overview
Adds a prototype WebGPU backend to ExecuTorch, enabling delegated GPU execution via wgpu-native while reusing the existing Vulkan delegate serialization format.
Changes:
- Introduce
EXECUTORCH_BUILD_WEBGPUand wire WebGPU backend into the top-level CMake build. - Add a new
backends/webgpuruntime (backend interface, graph builder/executor, device/context setup) plus a singleaten.add.Tensoroperator implemented in WGSL. - Add initial tests and helper scripts for exporting a model and running a native (wgpu-native) end-to-end validation.
Reviewed changes
Copilot reviewed 24 out of 26 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/cmake/preset/default.cmake | Adds EXECUTORCH_BUILD_WEBGPU build option. |
| CMakeLists.txt | Conditionally adds the WebGPU backend subdirectory and backend list entry. |
| backends/webgpu/CMakeLists.txt | Defines webgpu_backend library, imports wgpu-native, and adds a native test target. |
| backends/webgpu/README.md | Documents the prototype backend, architecture, and quick start. |
| backends/webgpu/TODO.md | Captures prototype limitations and planned roadmap. |
| backends/webgpu/runtime/WebGPUBackend.h | Declares the backend interface implementation. |
| backends/webgpu/runtime/WebGPUBackend.cpp | Implements backend init/execute/destroy and registers the backend. |
| backends/webgpu/runtime/WebGPUDelegateHeader.h | Declares delegate header parsing for VH00/VK00 blobs. |
| backends/webgpu/runtime/WebGPUDelegateHeader.cpp | Implements VH00 header parsing/validation. |
| backends/webgpu/runtime/WebGPUDevice.h | Declares native WebGPU context creation and global default context APIs. |
| backends/webgpu/runtime/WebGPUDevice.cpp | Implements wgpu-native instance/adapter/device acquisition and teardown. |
| backends/webgpu/runtime/WebGPUGraph.h | Declares graph structure (tensors, dispatches) and execution APIs. |
| backends/webgpu/runtime/WebGPUGraph.cpp | Implements VkGraph (VK00) parsing, buffer creation, dispatch recording, and execution. |
| backends/webgpu/runtime/ops/OperatorRegistry.h | Introduces a simple operator registry and registration macros. |
| backends/webgpu/runtime/ops/OperatorRegistry.cpp | Implements the registry lookup/registration singleton. |
| backends/webgpu/runtime/ops/add/BinaryOp.cpp | Implements aten.add.Tensor via a compute pipeline + uniform params. |
| backends/webgpu/runtime/ops/add/binary_add.wgsl | WGSL shader source for elementwise add with alpha. |
| backends/webgpu/runtime/ops/add/binary_add_wgsl.h | Embeds the WGSL shader as a C++ string constant. |
| backends/webgpu/scripts/setup-wgpu-native.sh | Downloads prebuilt wgpu-native binaries for native testing. |
| backends/webgpu/test/conftest.py | Adds a PyTorch LeafSpec workaround for test runs. |
| backends/webgpu/test/test_build_webgpu.sh | End-to-end script: pytest export, export .pte, build, run native test. |
| backends/webgpu/test/test_webgpu_native.cpp | Native test runner that loads a .pte and checks output correctness. |
| backends/webgpu/test/ops/init.py | Marks the ops test directory as a Python package. |
| backends/webgpu/test/ops/add/test_add.py | Python export tests using VulkanPartitioner and a helper to export a .pte. |
| backends/vulkan/cmake/ShaderLibrary.cmake | Adjusts the glslc presence check to be conditional on EXECUTORCH_BUILD_VULKAN. |
| .gitignore | Ignores backends/webgpu/third-party/ downloads. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Ensure vulkan_schema is available even when EXECUTORCH_BUILD_VULKAN is OFF. | ||
| # The WebGPU backend reuses the Vulkan FlatBuffer serialization format. | ||
| if(NOT TARGET vulkan_schema) | ||
| # We need the schema generation from the Vulkan backend. Build only the | ||
| # schema target by including the Vulkan CMakeLists.txt. The full Vulkan | ||
| # backend will only build if EXECUTORCH_BUILD_VULKAN is ON (which gates the | ||
| # vulkan_backend target), but vulkan_schema is unconditionally defined. | ||
| add_subdirectory( | ||
| ${CMAKE_CURRENT_SOURCE_DIR}/../vulkan | ||
| ${CMAKE_CURRENT_BINARY_DIR}/_vulkan_schema | ||
| ) |
There was a problem hiding this comment.
add_subdirectory(../vulkan ...) pulls in the full Vulkan CMakeLists.txt, which currently unconditionally builds shader libraries and the vulkan_backend target (not just vulkan_schema). This makes EXECUTORCH_BUILD_WEBGPU=ON effectively require the full Vulkan toolchain (e.g., glslc) and can also introduce duplicate backend registration. Consider factoring vulkan_schema into a standalone CMake include, or adding a schema-only mode/guards in backends/vulkan/CMakeLists.txt so including it here does not build the full Vulkan backend.
| # Ensure vulkan_schema is available even when EXECUTORCH_BUILD_VULKAN is OFF. | |
| # The WebGPU backend reuses the Vulkan FlatBuffer serialization format. | |
| if(NOT TARGET vulkan_schema) | |
| # We need the schema generation from the Vulkan backend. Build only the | |
| # schema target by including the Vulkan CMakeLists.txt. The full Vulkan | |
| # backend will only build if EXECUTORCH_BUILD_VULKAN is ON (which gates the | |
| # vulkan_backend target), but vulkan_schema is unconditionally defined. | |
| add_subdirectory( | |
| ${CMAKE_CURRENT_SOURCE_DIR}/../vulkan | |
| ${CMAKE_CURRENT_BINARY_DIR}/_vulkan_schema | |
| ) | |
| # WebGPU reuses the Vulkan FlatBuffer serialization format and therefore | |
| # requires the vulkan_schema target to be defined before this file is | |
| # processed. Do not pull in ../vulkan here with add_subdirectory(), because | |
| # that imports the full Vulkan backend build and can introduce extra | |
| # toolchain requirements (for example shader compilation tools) as well as | |
| # duplicate backend registration side effects. | |
| if(NOT TARGET vulkan_schema) | |
| message(FATAL_ERROR | |
| "webgpu_backend requires the vulkan_schema target, but it is not " | |
| "available. Provide vulkan_schema before including " | |
| "backends/webgpu/CMakeLists.txt. Do not use add_subdirectory(../vulkan) " | |
| "from here; instead expose vulkan_schema via a schema-only Vulkan CMake " | |
| "include or define vulkan_schema earlier in the build.") |
|
|
||
| namespace { | ||
| auto cls = WebGPUBackend(); | ||
| Backend backend{"VulkanBackend", &cls}; |
There was a problem hiding this comment.
This backend registers under the name "VulkanBackend", which collides with the real Vulkan backend (and register_backend() rejects duplicate names). If both are linked (including indirectly via the add_subdirectory(../vulkan ...) in this PR), one registration will fail and the delegate may run on the wrong backend or not be available. Please enforce mutual exclusion at CMake/config time, or register under a distinct backend name and update the delegate ID/export path accordingly.
| Backend backend{"VulkanBackend", &cls}; | |
| Backend backend{"WebGPUBackend", &cls}; |
| // Parse header to locate flatbuffer and constant data | ||
| Result<WebGPUDelegateHeader> header = | ||
| WebGPUDelegateHeader::parse(processed->data()); | ||
| if (!header.ok()) { | ||
| ET_LOG(Error, "WebGPUDelegateHeader may be corrupt"); | ||
| return header.error(); | ||
| } | ||
|
|
||
| const uint8_t* buffer_start = | ||
| reinterpret_cast<const uint8_t*>(processed->data()); | ||
| const uint8_t* flatbuffer_data = buffer_start + header->flatbuffer_offset; | ||
| const uint8_t* constant_data = buffer_start + header->bytes_offset; | ||
|
|
||
| // Verify FlatBuffer identifier | ||
| if (!vkgraph::VkGraphBufferHasIdentifier(flatbuffer_data)) { | ||
| ET_LOG( | ||
| Error, | ||
| "WebGPU delegate FlatBuffer identifier mismatch (expected VK00)"); | ||
| return Error::DelegateInvalidCompatibility; | ||
| } |
There was a problem hiding this comment.
WebGPUDelegateHeader offsets are used to compute flatbuffer_data / constant_data without checking they fall within processed->size(). A malformed/corrupt delegate blob could cause out-of-bounds reads (or FlatBuffers identifier checks on invalid memory). Please validate flatbuffer_offset + flatbuffer_size <= processed->size() and bytes_offset + bytes_size <= processed->size() (with overflow-safe arithmetic) before pointer arithmetic.
| WGPUBufferDescriptor uniform_desc = {}; | ||
| uniform_desc.size = sizeof(AddParams); | ||
| uniform_desc.usage = WGPUBufferUsage_Uniform | WGPUBufferUsage_CopyDst; | ||
| uniform_desc.mappedAtCreation = true; | ||
| WGPUBuffer uniform_buffer = wgpuDeviceCreateBuffer(device, &uniform_desc); | ||
| void* mapped = wgpuBufferGetMappedRange(uniform_buffer, 0, sizeof(AddParams)); | ||
| std::memcpy(mapped, ¶ms, sizeof(AddParams)); | ||
| wgpuBufferUnmap(uniform_buffer); | ||
|
|
There was a problem hiding this comment.
The uniform buffer created for params is never released. Even if the bind group retains its own reference, the original uniform_buffer handle still needs wgpuBufferRelease() (typically after wgpuDeviceCreateBindGroup). Otherwise each graph build leaks a WebGPU buffer object.
|
|
||
| #include <cmath> | ||
| #include <cstdio> | ||
| #include <cstdlib> | ||
| #include <string> |
There was a problem hiding this comment.
This file uses std::vector, std::min/std::max, and std::exception but does not include the corresponding headers (<vector>, <algorithm>, <exception>). This will fail to compile on standard toolchains that don't indirectly include them.
| #include <cmath> | |
| #include <cstdio> | |
| #include <cstdlib> | |
| #include <string> | |
| #include <algorithm> | |
| #include <cmath> | |
| #include <cstdio> | |
| #include <cstdlib> | |
| #include <exception> | |
| #include <string> | |
| #include <vector> |
| for (unsigned i = 0; i < fb_output_ids->size(); i++) { | ||
| int oid = static_cast<int>(fb_output_ids->Get(i)); | ||
| output_ids_.push_back(oid); | ||
|
|
||
| // Create staging buffer for output readback | ||
| WGPUBufferDescriptor staging_desc = {}; | ||
| staging_desc.size = tensors_[oid].nbytes > 0 ? tensors_[oid].nbytes : 4; | ||
| staging_desc.usage = WGPUBufferUsage_MapRead | WGPUBufferUsage_CopyDst; |
There was a problem hiding this comment.
oid from graph->output_ids() is used to index tensors_[oid] without validating that oid is in-range and refers to a tensor value. A corrupt/malicious FlatBuffer can cause out-of-bounds access or staging buffers sized from the wrong value type. Please validate 0 <= oid < num_vals and value_types_[oid] == Tensor before indexing/creating staging buffers.
| for (unsigned i = 0; i < chain->size(); i++) { | ||
| const auto* op_call = chain->Get(i); | ||
| std::string op_name = op_call->name()->str(); | ||
|
|
||
| if (!webgpu_operator_registry().has_op(op_name)) { | ||
| throw std::runtime_error( | ||
| "WebGPU backend: unsupported op: " + op_name); | ||
| } |
There was a problem hiding this comment.
op_call->name()->str() assumes op_call and op_call->name() are non-null. FlatBuffers fields are optional; a malformed model could crash here. Please null-check op_call/name() and throw a descriptive error if missing.
| if (flatbuffer_size == 0) { | ||
| return false; | ||
| } | ||
| if (bytes_offset < flatbuffer_offset + flatbuffer_size) { |
There was a problem hiding this comment.
is_valid() compares bytes_offset < flatbuffer_offset + flatbuffer_size using 32-bit fields; flatbuffer_offset + flatbuffer_size can overflow and incorrectly pass validation. Consider doing all offset/size arithmetic in uint64_t and (ideally) validating against the actual buffer length (e.g., pass the blob size into parse() or validate in the caller).
| if (bytes_offset < flatbuffer_offset + flatbuffer_size) { | |
| const uint64_t flatbuffer_end = | |
| static_cast<uint64_t>(flatbuffer_offset) + | |
| static_cast<uint64_t>(flatbuffer_size); | |
| if (static_cast<uint64_t>(bytes_offset) < flatbuffer_end) { |
| tensor.nbytes = numel * vk_datatype_size(vk_tensor->datatype()); | ||
|
|
||
| // Create GPU buffer | ||
| WGPUBufferDescriptor buf_desc = {}; | ||
| buf_desc.size = tensor.nbytes > 0 ? tensor.nbytes : 4; | ||
| buf_desc.usage = |
There was a problem hiding this comment.
tensor.nbytes is computed as numel * vk_datatype_size(...), but vk_datatype_size() returns 0 for unknown/unsupported dtypes. The current code then creates a 4-byte buffer and continues, which can mask delegate incompatibilities and lead to incorrect execution later. Since the WebGPU prototype only supports fp32 today, it would be safer to explicitly validate vk_tensor->datatype() == FLOAT32 (or at least vk_datatype_size(...) > 0) and throw an error when unsupported.
| @@ -0,0 +1,113 @@ | |||
| # WebGPU Backend | |||
|
|
|||
| Run ExecuTorch models on the GPU via [WebGPU](https://www.w3.org/TR/webgpu/). The backend compiles delegated subgraphs into WGSL compute shaders executed natively through [wgpu-native](https://github.com/gfx-rs/wgpu-native) (Metal on macOS, Vulkan on Linux/Windows). | |||
There was a problem hiding this comment.
The README states wgpu-native runs on "Vulkan on Linux/Windows", but both the setup script (setup-wgpu-native.sh) and the CMake link logic only handle macOS/Linux (no Windows zip selection, and links dl m pthread in the non-APPLE branch). Either add Windows support or clarify in the README that Windows is not supported yet for this prototype.
| Run ExecuTorch models on the GPU via [WebGPU](https://www.w3.org/TR/webgpu/). The backend compiles delegated subgraphs into WGSL compute shaders executed natively through [wgpu-native](https://github.com/gfx-rs/wgpu-native) (Metal on macOS, Vulkan on Linux/Windows). | |
| Run ExecuTorch models on the GPU via [WebGPU](https://www.w3.org/TR/webgpu/). The backend compiles delegated subgraphs into WGSL compute shaders executed natively through [wgpu-native](https://github.com/gfx-rs/wgpu-native) (Metal on macOS, Vulkan on Linux). Windows is not supported yet in this prototype. |
SS-JIA
left a comment
There was a problem hiding this comment.
Stamp to land now and unblock iteration. As a follow up I'll investigate how to have extend Vulkan's ComputeGraph abstractions for other GPU APIs.
Parses the VH00/VK00 FlatBuffer envelope from the Vulkan partitioner to extract the serialized graph payload.
Operator registry with registration macros, WGSL binary-add shader (plus inline C++ header), and the aten.add.Tensor implementation that creates a compute pipeline and records dispatch.
Buffer management, pipeline creation, and compute dispatch. Parses the Vulkan FlatBuffer delegate blob and builds a runnable graph of compute passes.
BackendInterface implementation that wires init/execute into ExecuTorch. Registers as "VulkanBackend" to consume .pte files from the Vulkan partitioner directly.
CMake integration: backend library target, Vulkan FlatBuffer schema dependency, root build flags, and glslc guard fix.
Export tests verify fp32 torch.add models produce a .pte with VulkanBackend delegate: 2D/3D/4D shapes, broadcasting, self-add, scalar add, and chained adds. Includes TODO with architecture notes and next steps.
WebGPUDevice wraps wgpu-native (Metal/Vulkan) behind a uniform C++ interface. Includes a setup script that downloads prebuilt wgpu-native binaries.
Wire wgpu-native into the CMake build and integrate WebGPUDevice into the compute graph for native Metal/Vulkan execution.
C++ test runner that loads a .pte and runs inference via wgpu-native. End-to-end build script that exports a model, builds the native runtime, and validates output.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 24 out of 26 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| void OperatorRegistry::register_op( | ||
| const std::string& name, | ||
| const OpFunction& fn) { | ||
| table_.insert(std::make_pair(name, fn)); |
There was a problem hiding this comment.
OperatorRegistry::register_op uses unordered_map::insert(), which silently ignores duplicate registrations for the same op name. This can hide accidental double-registration and leave an unexpected implementation active. Consider detecting duplicates and either overwriting intentionally or throwing/logging when an op is already registered.
| table_.insert(std::make_pair(name, fn)); | |
| const auto [it, inserted] = table_.insert(std::make_pair(name, fn)); | |
| if (!inserted) { | |
| throw std::runtime_error( | |
| "WebGPU OperatorRegistry: duplicate operator registration: " + name); | |
| } |
| // Request adapter using AllowSpontaneous mode (fires during | ||
| // wgpuInstanceProcessEvents or any other API call). | ||
| AdapterResult adapter_result; | ||
| WGPURequestAdapterCallbackInfo adapter_cb = {}; | ||
| adapter_cb.mode = WGPUCallbackMode_AllowSpontaneous; | ||
| adapter_cb.callback = on_adapter_request; | ||
| adapter_cb.userdata1 = &adapter_result; | ||
|
|
||
| wgpuInstanceRequestAdapter(ctx.instance, nullptr, adapter_cb); | ||
| while (!adapter_result.done) { | ||
| wgpuInstanceProcessEvents(ctx.instance); | ||
| } |
There was a problem hiding this comment.
create_webgpu_context() busy-waits on adapter/device requests with while (!*_result.done) { wgpuInstanceProcessEvents(...) } and has no timeout/failsafe. If the callback never fires (driver/runtime issue), this will hang indefinitely. Please add a bounded timeout and throw a clear error when exceeded.
| namespace { | ||
| auto cls = WebGPUBackend(); | ||
| Backend backend{"VulkanBackend", &cls}; | ||
| static auto success_with_compiler = register_backend(backend); |
There was a problem hiding this comment.
WebGPU backend registers itself under the name "VulkanBackend". If EXECUTORCH_BUILD_VULKAN and EXECUTORCH_BUILD_WEBGPU are both enabled, register_backend() will reject the duplicate name and one backend will silently fail to register (the return value is currently ignored). Please enforce mutual exclusivity in CMake (or register under a distinct name and update export path), and/or check the register_backend() return value and surface a clear error.
| static auto success_with_compiler = register_backend(backend); | |
| static const bool success_with_compiler = []() { | |
| const Error err = register_backend(backend); | |
| if (err != Error::Ok) { | |
| ET_LOG( | |
| Error, | |
| "Failed to register WebGPU backend under name '%s' (possible duplicate registration). Error code: 0x%x", | |
| "VulkanBackend", | |
| static_cast<unsigned int>(err)); | |
| return false; | |
| } | |
| return true; | |
| }(); |
| const size_t num_inputs = graph->input_ids().size(); | ||
| const size_t num_outputs = graph->output_ids().size(); | ||
|
|
||
| // Copy inputs from EValue tensors to GPU buffers | ||
| std::vector<std::pair<const void*, size_t>> inputs; | ||
| inputs.reserve(num_inputs); | ||
| for (size_t i = 0; i < num_inputs; i++) { | ||
| const auto& tensor = args[i]->toTensor(); | ||
| inputs.emplace_back(tensor.const_data_ptr(), tensor.nbytes()); | ||
| } | ||
| graph->copy_inputs(inputs); | ||
|
|
||
| // Execute the compute graph | ||
| graph->execute(); | ||
|
|
||
| // Copy outputs from GPU staging buffers to EValue tensor data pointers | ||
| std::vector<std::pair<void*, size_t>> outputs; | ||
| outputs.reserve(num_outputs); | ||
| for (size_t i = 0; i < num_outputs; i++) { | ||
| const size_t arg_idx = num_inputs + i; | ||
| auto& tensor = args[arg_idx]->toTensor(); | ||
| outputs.emplace_back(tensor.mutable_data_ptr(), tensor.nbytes()); | ||
| } |
There was a problem hiding this comment.
execute() assumes outputs start at index num_inputs (arg_idx = num_inputs + i). Other backends compute the output offset as args.size() - num_outputs, which is more robust if args includes extra values or input count differs from args layout. Please compute output_offset from args.size(), and validate args.size() >= num_inputs + num_outputs before indexing.
| std::string op_name = op_call->name()->str(); | ||
|
|
||
| if (!webgpu_operator_registry().has_op(op_name)) { | ||
| throw std::runtime_error("WebGPU backend: unsupported op: " + op_name); | ||
| } | ||
|
|
||
| const auto* fb_args = op_call->args(); | ||
| std::vector<int> args; | ||
| if (fb_args) { | ||
| for (unsigned j = 0; j < fb_args->size(); j++) { | ||
| args.push_back(static_cast<int>(fb_args->Get(j))); | ||
| } |
There was a problem hiding this comment.
operator name is read via op_call->name()->str() without checking op_call->name() for null. FlatBuffers strings can be null in malformed/corrupt inputs, which would crash here. Please validate required fields (name, args) during build and return a compatibility error instead of crashing.
| std::string op_name = op_call->name()->str(); | |
| if (!webgpu_operator_registry().has_op(op_name)) { | |
| throw std::runtime_error("WebGPU backend: unsupported op: " + op_name); | |
| } | |
| const auto* fb_args = op_call->args(); | |
| std::vector<int> args; | |
| if (fb_args) { | |
| for (unsigned j = 0; j < fb_args->size(); j++) { | |
| args.push_back(static_cast<int>(fb_args->Get(j))); | |
| } | |
| if (!op_call) { | |
| throw std::runtime_error( | |
| "WebGPU backend: incompatible graph: operator call is missing"); | |
| } | |
| const auto* fb_name = op_call->name(); | |
| if (!fb_name) { | |
| throw std::runtime_error( | |
| "WebGPU backend: incompatible graph: operator name is missing"); | |
| } | |
| std::string op_name = fb_name->str(); | |
| if (!webgpu_operator_registry().has_op(op_name)) { | |
| throw std::runtime_error("WebGPU backend: unsupported op: " + op_name); | |
| } | |
| const auto* fb_args = op_call->args(); | |
| if (!fb_args) { | |
| throw std::runtime_error( | |
| "WebGPU backend: incompatible graph: args are missing for op: " + | |
| op_name); | |
| } | |
| std::vector<int> args; | |
| for (unsigned j = 0; j < fb_args->size(); j++) { | |
| args.push_back(static_cast<int>(fb_args->Get(j))); |
| echo "Downloading wgpu-native ${WGPU_VERSION} for ${PLATFORM}-${WGPU_ARCH}..." | ||
| TMPDIR_DL="$(mktemp -d)" | ||
| trap "rm -rf ${TMPDIR_DL}" EXIT | ||
|
|
||
| curl -sL "${URL}" -o "${TMPDIR_DL}/${ZIP_NAME}" | ||
|
|
||
| mkdir -p "${WGPU_DIR}" | ||
| unzip -qo "${TMPDIR_DL}/${ZIP_NAME}" -d "${WGPU_DIR}" | ||
|
|
There was a problem hiding this comment.
setup-wgpu-native.sh downloads and unzips a release artifact but doesn't check curl/unzip success (curl is run with -sL, which can still write an HTML error page, and unzip will still exit 0 in some cases if the file isn't a valid zip). Consider adding curl -f (fail on HTTP errors) and validating the expected output file (lib/libwgpu_native.a) exists after unzip to make failures actionable.
| // Access tensors by value ID (used by op implementations). | ||
| WebGPUTensor& get_tensor(int id) { | ||
| return tensors_[id]; | ||
| } | ||
| const WebGPUTensor& get_tensor(int id) const { | ||
| return tensors_[id]; | ||
| } | ||
|
|
||
| // Access scalar values stored during graph build. | ||
| double get_double(int id) const { | ||
| return doubles_[id]; | ||
| } | ||
| int64_t get_int(int id) const { | ||
| return ints_[id]; | ||
| } | ||
|
|
||
| WGPUDevice device() const { | ||
| return device_; | ||
| } | ||
| WGPUQueue queue() const { | ||
| return queue_; | ||
| } | ||
|
|
||
| void add_dispatch(WebGPUDispatch dispatch) { | ||
| dispatches_.push_back(dispatch); | ||
| } | ||
|
|
||
| void add_uniform_buffer_bytes(size_t bytes) { | ||
| uniform_buffer_bytes_ += bytes; | ||
| } | ||
|
|
||
| void set_instance(WGPUInstance instance) { | ||
| instance_ = instance; | ||
| } | ||
| void set_device(WGPUDevice device) { | ||
| device_ = device; | ||
| } | ||
|
|
||
| WebGPUMemoryStats memory_stats() const; | ||
|
|
||
| int num_values() const { | ||
| return static_cast<int>(value_types_.size()); | ||
| } | ||
|
|
||
| enum class ValueType { Tensor, Int, Double, Bool, Null, String }; | ||
|
|
||
| ValueType get_value_type(int id) const { | ||
| return value_types_[id]; | ||
| } |
There was a problem hiding this comment.
get_tensor()/get_value_type()/get_int()/get_double() index internal vectors with operator[] and do not validate that the value id is in range. Since ids come from the delegate flatbuffer, malformed/corrupt programs could cause out-of-bounds access and a hard crash. Please add bounds checks (e.g., use at() or explicit range checks) and fail build/init gracefully (throw to be caught by init()).
| if(APPLE) | ||
| target_link_libraries( | ||
| webgpu_backend PRIVATE "-framework Metal" "-framework QuartzCore" | ||
| "-framework CoreGraphics" "-framework Foundation" | ||
| ) | ||
| else() | ||
| target_link_libraries(webgpu_backend PRIVATE dl m pthread) | ||
| endif() | ||
|
|
||
| target_compile_options(webgpu_backend PRIVATE -fexceptions) | ||
|
|
There was a problem hiding this comment.
This CMakeLists.txt unconditionally adds GCC/Clang-only flags ("-fexceptions") and links POSIX-only libs (dl/m/pthread) under the non-APPLE branch. On WIN32/MSVC this will fail to configure/build if EXECUTORCH_BUILD_WEBGPU is enabled. Please add proper compiler/platform guards (e.g., /EHsc on MSVC, and appropriate Windows system libs) or explicitly disable WebGPU on unsupported platforms with a clear message.
| float max_error = 0.0f; | ||
| int check_count = std::min(size, 1024); | ||
| for (int i = 0; i < check_count; i++) { | ||
| float expected = a_data[i] + b_data[i]; | ||
| float error = std::abs(out_data[i] - expected); | ||
| max_error = std::max(max_error, error); | ||
| } |
There was a problem hiding this comment.
test_webgpu_native.cpp uses std::min/std::max but doesn't include . This can fail to compile on some standard libraries (notably libc++) because these functions are only guaranteed to be declared via . Please add the missing include to avoid relying on transitive includes.
wgpu prototype